{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Vector Math"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook we'll demo that word2vec-like properties are kept. You can download the vectors, follow along at home, and make your own queries if you'd like.\n",
    "\n",
    "Sums:\n",
    "\n",
    "1. `silicon valley ~ california + technology`  \n",
    "2. `uber ~ taxis + company`\n",
    "3. `baidu ~ china + search engine`\n",
    "\n",
    "Analogies:\n",
    "\n",
    "1. `Mark Zuckerberg - Facebook + Amazon = Jeff Bezos`\n",
    "1. `Hacker News - story + article = StackOverflow`\n",
    "1. `VIM - terminal + graphics = Photoshop`\n",
    "\n",
    "And slightly more whimsically:\n",
    "\n",
    "1. `vegeables - eat + drink = tea`\n",
    "2. `scala - features + simple = haskell`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2016-04-17 12:56:06--  https://zenodo.org/record/49903/files/vocab.npy\n",
      "Resolving zenodo.org (zenodo.org)... 188.184.66.202\n",
      "Connecting to zenodo.org (zenodo.org)|188.184.66.202|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 81754640 (78M) [application/octet-stream]\n",
      "Saving to: ‘vocab.npy’\n",
      "\n",
      "vocab.npy           100%[=====================>]  77.97M  9.21MB/s   in 23s    \n",
      "\n",
      "2016-04-17 12:56:32 (3.37 MB/s) - ‘vocab.npy’ saved [81754640/81754640]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!wget https://zenodo.org/record/49903/files/vocab.npy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2016-04-17 12:55:41--  https://zenodo.org/record/49903/files/word_vectors.npy\n",
      "Resolving zenodo.org (zenodo.org)... 188.184.66.202\n",
      "Connecting to zenodo.org (zenodo.org)|188.184.66.202|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 116273232 (111M) [application/octet-stream]\n",
      "Saving to: ‘word_vectors.npy’\n",
      "\n",
      "word_vectors.npy    100%[=====================>] 110.89M  6.64MB/s   in 21s    \n",
      "\n",
      "2016-04-17 12:56:06 (5.31 MB/s) - ‘word_vectors.npy’ saved [116273232/116273232]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!wget https://zenodo.org/record/49903/files/word_vectors.npy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You don't need to run the code below unless you've trained your own model. Otherwise, just download the word vectors from the URL above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#from lda2vec_model import LDA2Vec\n",
    "#from chainer import serializers\n",
    "#import numpy as np\n",
    "#import pandas as pd\n",
    "#import pickle\n",
    "#\n",
    "#features = pd.read_pickle(\"../data/features.pd\")\n",
    "#vocab = np.load(\"../data/vocab\")\n",
    "#npz = np.load(open('topics.story.pyldavis.npz', 'r'))\n",
    "#dat = {k: v for (k, v) in npz.iteritems()}\n",
    "#vocab = dat['vocab'].tolist()\n",
    "#dat = np.load(\"../data/data.npz\")\n",
    "#n_stories = features.story_id_codes.max() + 1\n",
    "#n_units = 256\n",
    "#n_vocab = dat['flattened'].max() + 1\n",
    "#model = LDA2Vec(n_stories=n_stories, n_story_topics=40,\n",
    "#                n_authors=5664, n_author_topics=20,\n",
    "#                n_units=n_units, n_vocab=n_vocab, counts=np.zeros(n_vocab),\n",
    "#                n_samples=15)\n",
    "#serializers.load_hdf5(\"/home/chris/lda2vec-12/examples/hacker_news/lda2vec/lda2vec.hdf5\", model)\n",
    "#np.save(\"word_vectors\", model.sampler.W.data)\n",
    "#np.save(\"vocab\", vocab)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "word_vectors_raw = np.load(\"word_vectors.npy\")\n",
    "vocab = np.load(\"vocab.npy\").tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "L2 Normalize the word vectors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "word_vectors = word_vectors_raw / np.linalg.norm(word_vectors_raw, axis=-1)[:, None]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def get_vector(token):\n",
    "    index = vocab.index(token)\n",
    "    return word_vectors[index, :].copy()\n",
    "\n",
    "def most_similar(token, n=20):\n",
    "    word_vector = get_vector(token)\n",
    "    similarities = np.dot(word_vectors, word_vector)\n",
    "    top = np.argsort(similarities)[::-1][:n]\n",
    "    return [vocab[i] for i in top]\n",
    "\n",
    "# This is Levy & Goldberg's 3Cosmul Metric\n",
    "# Based on the Gensim implementation: https://github.com/piskvorky/gensim/blob/master/gensim/models/word2vec.py\n",
    "def cosmul(positives, negatives, topn=20):\n",
    "    positive = [get_vector(p) for p in positives]\n",
    "    negative = [get_vector(n) for n in negatives]\n",
    "    pos_dists = [((1 + np.dot(word_vectors, term)) / 2.) for term in positive]\n",
    "    neg_dists = [((1 + np.dot(word_vectors, term)) / 2.) for term in negative]\n",
    "    dists = np.prod(pos_dists, axis=0) / (np.prod(neg_dists, axis=0) + 1e-6)\n",
    "    idxs = np.argsort(dists)[::-1][:topn]\n",
    "    return [vocab[i] for i in idxs if (vocab[i] not in positives) and (vocab[i] not in negatives)]\n",
    "def most_similar_posneg(positives, negatives, topn=20):\n",
    "    positive = np.sum([get_vector(p) for p in positives], axis=0)\n",
    "    negative = np.sum([get_vector(n) for n in negatives], axis=0)\n",
    "    vector = positive - negative\n",
    "    dists = np.dot(word_vectors, vector)\n",
    "    idxs = np.argsort(dists)[::-1][:topn]\n",
    "    return [vocab[i] for i in idxs if (vocab[i] not in positives) and (vocab[i] not in negatives)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[u'san francisco',\n",
       " u'new york',\n",
       " u'nyc',\n",
       " u'palo alto',\n",
       " u'mountain view',\n",
       " u'boston',\n",
       " u'seattle',\n",
       " u'sf',\n",
       " u'los angeles',\n",
       " u'new york city',\n",
       " u'london',\n",
       " u'ny',\n",
       " u'brooklyn',\n",
       " u'chicago',\n",
       " u'austin',\n",
       " u'atlanta',\n",
       " u'portland',\n",
       " u'san jose',\n",
       " u'san mateo',\n",
       " u'sunnyvale']"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "most_similar('san francisco')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[u'silicon valley',\n",
       " u'in',\n",
       " u'new york',\n",
       " u'u.s.',\n",
       " u'west',\n",
       " u'tech',\n",
       " u'usa',\n",
       " u'san francisco',\n",
       " u'japan',\n",
       " u'america',\n",
       " u'dc',\n",
       " u'industry',\n",
       " u'canada',\n",
       " u'new york city',\n",
       " u'nyc',\n",
       " u'area',\n",
       " u'valley',\n",
       " u'china']"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cosmul(['california', 'technology'], [], topn=20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[u'currencies',\n",
       " u'bitcoin',\n",
       " u'goods',\n",
       " u'physical',\n",
       " u'gold',\n",
       " u'fiat',\n",
       " u'trading',\n",
       " u'cryptocurrency',\n",
       " u'bitcoins',\n",
       " u'electronic',\n",
       " u'analog',\n",
       " u'transfers',\n",
       " u'banking',\n",
       " u'commodity',\n",
       " u'mining',\n",
       " u'virtual currency',\n",
       " u'other currencies',\n",
       " u'media']"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cosmul(['digital', 'currency'], [], topn=20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[u'vim',\n",
       " u'emacs',\n",
       " u'editor',\n",
       " u'sublime',\n",
       " u'tmux',\n",
       " u'shell',\n",
       " u'iterm',\n",
       " u'vi',\n",
       " u'ide',\n",
       " u'debugger',\n",
       " u'latex',\n",
       " u'gui',\n",
       " u'gvim',\n",
       " u'notepad',\n",
       " u'eclipse',\n",
       " u'command line',\n",
       " u'terminal.app',\n",
       " u'window manager']"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cosmul(['text editor', 'terminal'], [], topn=20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[u'russia',\n",
       " u'india',\n",
       " u'japan',\n",
       " u'africa',\n",
       " u'korea',\n",
       " u'germany',\n",
       " u'other countries',\n",
       " u'asia',\n",
       " u'ukraine',\n",
       " u'iran',\n",
       " u'brazil',\n",
       " u'israel',\n",
       " u'usa',\n",
       " u'vietnam',\n",
       " u'france',\n",
       " u'countries',\n",
       " u'south korea',\n",
       " u'hong kong',\n",
       " u'europe']"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cosmul(['china'], [], topn=20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[u'baidu',\n",
       " u'google',\n",
       " u'google search',\n",
       " u'india',\n",
       " u'russia',\n",
       " u'japan',\n",
       " u'iran',\n",
       " u'country',\n",
       " u'yandex',\n",
       " u'africa',\n",
       " u'duckduckgo',\n",
       " u'south korea',\n",
       " u'bing',\n",
       " u'france',\n",
       " u'beijing',\n",
       " u'hong kong',\n",
       " u'great firewall',\n",
       " u'search engines']"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cosmul(['china', 'search engine'], [], topn=20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[u'apple',\n",
       " u'ms',\n",
       " u'msft',\n",
       " u'google',\n",
       " u'nokia',\n",
       " u'adobe',\n",
       " u'samsung',\n",
       " u'hp',\n",
       " u'rim',\n",
       " u'oracle',\n",
       " u'valve',\n",
       " u'mozilla',\n",
       " u'ibm',\n",
       " u'motorola',\n",
       " u'oems',\n",
       " u'ballmer',\n",
       " u'intel',\n",
       " u'ms.',\n",
       " u'canonical']"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cosmul(['microsoft'], [], topn=20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[u'apple',\n",
       " u'google',\n",
       " u'enterprise',\n",
       " u'azure',\n",
       " u'ms',\n",
       " u'skydrive',\n",
       " u'sharepoint',\n",
       " u'walled garden',\n",
       " u'icloud',\n",
       " u'oracle',\n",
       " u'chrome os',\n",
       " u'cloud services',\n",
       " u'android market',\n",
       " u'adobe',\n",
       " u'app store',\n",
       " u'rackspace',\n",
       " u'hp',\n",
       " u'samsung']"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cosmul(['microsoft', 'cloud'], [], topn=20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Queen is several rankings down, so not exactly the same as out of the box word2vec!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[u'professional context',\n",
       " u'female',\n",
       " u'pawn',\n",
       " u'content farm',\n",
       " u'queen',\n",
       " u'career trajectory',\n",
       " u'real risk',\n",
       " u'philadelphia',\n",
       " u'teen',\n",
       " u'shitty place',\n",
       " u'prussia',\n",
       " u'criminal offense',\n",
       " u'main theme',\n",
       " u'she',\n",
       " u'magician',\n",
       " u'gray area',\n",
       " u'herself',\n",
       " u'best site']"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cosmul(['king', 'woman'], ['man'], topn=20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Most similar\n",
      "mark zuckerberg\n",
      "bill gates\n",
      "zuckerberg\n",
      "larry page\n",
      "zuck\n",
      "steve jobs\n",
      "sergey brin\n",
      "jeff bezos\n",
      "gates\n",
      "warren buffet\n",
      "ceo\n",
      "peter thiel\n",
      "paul allen\n",
      "sean parker\n",
      "jack dorsey\n",
      "paul graham\n",
      "richard branson\n",
      "sergey\n",
      "linus torvalds\n",
      "larry ellison\n",
      "\n",
      "Cosmul\n",
      "jeff bezos\n",
      "elon musk\n",
      "warren buffet\n",
      "bezos\n",
      "michael dell\n",
      "bill gates\n",
      "musk\n",
      "hp\n",
      "toshiba\n",
      "dell\n",
      "richard branson\n",
      "elon\n",
      "buffet\n",
      "john carmack\n",
      "steve wozniak\n",
      "asus\n",
      "ford\n",
      "morgan\n",
      "\n",
      "Traditional Similarity\n",
      "jeff bezos\n",
      "bill gates\n",
      "elon musk\n",
      "bezos\n",
      "warren buffet\n",
      "michael dell\n",
      "hp\n",
      "musk\n",
      "richard branson\n",
      "dell\n",
      "toshiba\n",
      "john carmack\n",
      "buffet\n",
      "peter thiel\n",
      "steve wozniak\n",
      "gates\n",
      "steve jobs\n",
      "ford\n"
     ]
    }
   ],
   "source": [
    "print 'Most similar'\n",
    "print '\\n'.join(most_similar('mark zuckerberg'))\n",
    "print '\\nCosmul'\n",
    "pos = ['mark zuckerberg', 'amazon']\n",
    "neg = ['facebook']\n",
    "print '\\n'.join(cosmul(pos, neg, topn=20))\n",
    "print '\\nTraditional Similarity'\n",
    "print '\\n'.join(most_similar_posneg(pos, neg, topn=20))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Most similar\n",
      "hacker news\n",
      "hn\n",
      "hn.\n",
      "reddit\n",
      "front page\n",
      "hackernews\n",
      "commenting\n",
      "posted\n",
      "frontpage\n",
      "comment\n",
      "posting\n",
      "upvoted\n",
      "slashdot\n",
      "news.yc\n",
      "comments\n",
      "posts\n",
      "proggit\n",
      "post\n",
      "techcrunch\n",
      "top story\n",
      "\n",
      "Cosmul\n",
      "stack overflow\n",
      "stackoverflow\n",
      "answers\n",
      "answering\n",
      "answer\n",
      "questions\n",
      "quora\n",
      "answered\n",
      "ask\n",
      "hn\n",
      "other questions\n",
      "other question\n",
      "programming questions\n",
      "asking\n",
      "stackexchange\n",
      "stack exchange\n",
      "why\n",
      "basic questions\n",
      "\n",
      "Traditional Similarity\n",
      "stack overflow\n",
      "answer\n",
      "stackoverflow\n",
      "answering\n",
      "answers\n",
      "hn\n",
      "questions\n",
      "answered\n",
      "quora\n",
      "ask\n",
      "asking\n",
      "other question\n",
      "other questions\n",
      "first question\n",
      "stackexchange\n",
      "hn.\n",
      "programming questions\n",
      "hackernews\n"
     ]
    }
   ],
   "source": [
    "pos = ['hacker news', 'question']\n",
    "neg = ['story']\n",
    "\n",
    "print 'Most similar'\n",
    "print '\\n'.join(most_similar(pos[0]))\n",
    "print '\\nCosmul'\n",
    "print '\\n'.join(cosmul(pos, neg, topn=20))\n",
    "print '\\nTraditional Similarity'\n",
    "print '\\n'.join(most_similar_posneg(pos, neg, topn=20))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Most similar\n",
      "san francisco\n",
      "new york\n",
      "nyc\n",
      "palo alto\n",
      "mountain view\n",
      "boston\n",
      "seattle\n",
      "sf\n",
      "los angeles\n",
      "new york city\n",
      "london\n",
      "ny\n",
      "brooklyn\n",
      "chicago\n",
      "austin\n",
      "atlanta\n",
      "portland\n",
      "san jose\n",
      "san mateo\n",
      "sunnyvale\n",
      "\n",
      "Cosmul\n",
      "new york\n",
      "nyc\n",
      "palo alto\n",
      "mountain view\n",
      "boston\n",
      "seattle\n",
      "sf\n",
      "los angeles\n",
      "new york city\n",
      "london\n",
      "ny\n",
      "brooklyn\n",
      "chicago\n",
      "austin\n",
      "atlanta\n",
      "portland\n",
      "san jose\n",
      "san mateo\n",
      "sunnyvale\n",
      "\n",
      "Traditional Similarity\n",
      "new york\n",
      "nyc\n",
      "palo alto\n",
      "mountain view\n",
      "boston\n",
      "seattle\n",
      "sf\n",
      "los angeles\n",
      "new york city\n",
      "london\n",
      "ny\n",
      "brooklyn\n",
      "chicago\n",
      "austin\n",
      "atlanta\n",
      "portland\n",
      "san jose\n",
      "san mateo\n",
      "sunnyvale\n"
     ]
    }
   ],
   "source": [
    "pos = ['san francisco']\n",
    "neg = []\n",
    "\n",
    "print 'Most similar'\n",
    "print '\\n'.join(most_similar(pos[0]))\n",
    "print '\\nCosmul'\n",
    "print '\\n'.join(cosmul(pos, neg, topn=20))\n",
    "print '\\nTraditional Similarity'\n",
    "print '\\n'.join(most_similar_posneg(pos, neg, topn=20))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Most similar\n",
      "nlp\n",
      "machine learning\n",
      "data mining\n",
      "computer vision\n",
      "natural language processing\n",
      "ml\n",
      "image processing\n",
      "analytics\n",
      "classification\n",
      "algorithms\n",
      "data science\n",
      "hadoop\n",
      "analysis\n",
      "ai\n",
      "clustering\n",
      "mapreduce\n",
      "algorithm design\n",
      "information retrieval\n",
      "data analysis\n",
      "statistical\n",
      "\n",
      "Cosmul\n",
      "computer vision\n",
      "machine learning\n",
      "data mining\n",
      "image processing\n",
      "ai\n",
      "analytics\n",
      "algorithm\n",
      "randomized\n",
      "classification\n",
      "natural language processing\n",
      "hadoop\n",
      "engine\n",
      "statistical\n",
      "analysis\n",
      "machine\n",
      "clustering\n",
      "ml\n",
      "artificial intelligence\n",
      "neo4j\n",
      "\n",
      "Traditional Similarity\n",
      "computer vision\n",
      "machine learning\n",
      "data mining\n",
      "image processing\n",
      "ai\n",
      "analytics\n",
      "algorithm\n",
      "natural language processing\n",
      "classification\n",
      "randomized\n",
      "analysis\n",
      "ml\n",
      "hadoop\n",
      "engine\n",
      "machine\n",
      "statistical\n",
      "clustering\n",
      "visualization\n"
     ]
    }
   ],
   "source": [
    "pos = ['nlp', 'image']\n",
    "neg = ['text']\n",
    "\n",
    "print 'Most similar'\n",
    "print '\\n'.join(most_similar(pos[0]))\n",
    "print '\\nCosmul'\n",
    "print '\\n'.join(cosmul(pos, neg, topn=20))\n",
    "print '\\nTraditional Similarity'\n",
    "print '\\n'.join(most_similar_posneg(pos, neg, topn=20))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Most similar\n",
      "vim\n",
      "emacs\n",
      "vi\n",
      "sublime\n",
      "tmux\n",
      "textmate\n",
      "eclipse\n",
      "sublime text\n",
      "macvim\n",
      "zsh\n",
      "org-mode\n",
      "terminal\n",
      "st2\n",
      "bbedit\n",
      "intellij\n",
      "text editor\n",
      "latex\n",
      "notepad++\n",
      "netbeans\n",
      "other editors\n",
      "\n",
      "Cosmul\n",
      "photoshop\n",
      "animations\n",
      "typography\n",
      "programming\n",
      "layout\n",
      "textures\n",
      "web design\n",
      "fonts\n",
      "coding\n",
      "illustrator\n",
      "common lisp\n",
      "design\n",
      "prototyping\n",
      "canvas\n",
      "css.\n",
      "css\n",
      "diagrams\n",
      "vector graphics\n",
      "usability\n",
      "\n",
      "Traditional Similarity\n",
      "photoshop\n",
      "animations\n",
      "textures\n",
      "layout\n",
      "typography\n",
      "programming\n",
      "fonts\n",
      "coding\n",
      "illustrator\n",
      "design\n",
      "web design\n",
      "common lisp\n",
      "canvas\n",
      "photography\n",
      "ides\n",
      "visual\n",
      "animation\n",
      "css\n"
     ]
    }
   ],
   "source": [
    "pos = ['vim', 'graphics']\n",
    "neg = ['terminal']\n",
    "\n",
    "print 'Most similar'\n",
    "print '\\n'.join(most_similar(pos[0]))\n",
    "print '\\nCosmul'\n",
    "print '\\n'.join(cosmul(pos, neg, topn=20))\n",
    "print '\\nTraditional Similarity'\n",
    "print '\\n'.join(most_similar_posneg(pos, neg, topn=20))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Most similar\n",
      "vegetables\n",
      "meat\n",
      "rice\n",
      "meats\n",
      "fruit\n",
      "veggies\n",
      "pasta\n",
      "salads\n",
      "eat\n",
      "fruits\n",
      "cheese\n",
      "carrots\n",
      "potatoes\n",
      "beans\n",
      "seafood\n",
      "soy\n",
      "yogurt\n",
      "spices\n",
      "dairy\n",
      "fats\n",
      "\n",
      "Cosmul\n",
      "tea\n",
      "coffee\n",
      "beer\n",
      "drinking\n",
      "red wine\n",
      "soda\n",
      "cup\n",
      "alcohol\n",
      "cups\n",
      "vodka\n",
      "rice\n",
      "fruit\n",
      "whisky\n",
      "orange juice\n",
      "milk\n",
      "espresso\n",
      "drinks\n",
      "carrots\n",
      "\n",
      "Traditional Similarity\n",
      "tea\n",
      "coffee\n",
      "beer\n",
      "drinking\n",
      "soda\n",
      "red wine\n",
      "cup\n",
      "alcohol\n",
      "rice\n",
      "cups\n",
      "fruit\n",
      "vodka\n",
      "milk\n",
      "drinks\n",
      "orange juice\n",
      "carrots\n",
      "whisky\n",
      "pasta\n"
     ]
    }
   ],
   "source": [
    "pos = ['vegetables', 'drink']\n",
    "neg = ['eat']\n",
    "\n",
    "print 'Most similar'\n",
    "print '\\n'.join(most_similar(pos[0]))\n",
    "print '\\nCosmul'\n",
    "print '\\n'.join(cosmul(pos, neg, topn=20))\n",
    "print '\\nTraditional Similarity'\n",
    "print '\\n'.join(most_similar_posneg(pos, neg, topn=20))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Most similar\n",
      "lda\n",
      "linear\n",
      "kmeans\n",
      "clustering\n",
      "-2\n",
      "176\n",
      "classification\n",
      "svm\n",
      "10000000\n",
      "minaway\n",
      "mb&#x2f;s\n",
      "statistical\n",
      "173\n",
      "ans\n",
      "joiner\n",
      "stdev\n",
      "because:<p><pre><code\n",
      "regression\n",
      "\t\n",
      "gaussian\n"
     ]
    }
   ],
   "source": [
    "pos = ['lda', '']\n",
    "neg = ['']\n",
    "\n",
    "print 'Most similar'\n",
    "print '\\n'.join(most_similar(pos[0]))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
